AITopics | Question Answering

Collaborating Authors

Question Answering

"Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. This simplifies things for users in two ways. First, questions do not often translate into a simple list of keywords. ...Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents."
– from Bonnie Webber & Nick Webb. Question Answering. In The Handbook of Computational Linguistics and Natural Language Processing. Alexander Clark, Chris Fox, Shalom Lappin (Eds.). Wiley, 2010.

News Overviews Instructional Materials AI-Alerts Classics

Tool Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task

Neural Information Processing SystemsJun-22-2026, 10:26:32 GMT

Video Question Answering (VideoQA) task serves as a critical playground for evaluating whether foundation models can effectively perceive, understand, and reason about dynamic real-world scenarios.

large language model, machine learning, question answering, (23 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
(2 more...)

Add feedback

Language-Bias-Resilient Visual Question Answering via Adaptive Multi-Margin Collaborative Debiasing

Neural Information Processing SystemsJun-22-2026, 05:26:47 GMT

Language bias in Visual Question Answering (VQA) arises when models exploit spurious statistical correlations between question templates and answers, particularly in out-of-distribution scenarios, thereby neglecting essential visual cues and compromising genuine multimodal reasoning. Despite numerous efforts to enhance the robustness of VQA models, a principled understanding of how such bias originates and influences model behavior remains underdeveloped. In this paper, we address this gap through a comprehensive empirical and theoretical analysis, revealing that modality-specific gradient imbalances, which originate from the inherent heterogeneity of multimodal data, lead to skewed feature fusion and biased classifier weights. To alleviate these issues, we propose a novel MultiMargin Collaborative Debiasing (MMCD) framework2, which adaptively integrates frequency-aware, confidence-aware, and difficulty-aware angular margins with a dynamic, difficulty-aware contrastive learning mechanism to reshape decision boundaries under biased training conditions. Extensive experiments across multiple challenging VQA benchmarks confirm the consistent superiority of our proposed MMCD over state-of-the-art baselines in combating language bias.

machine learning, natural language, question answering, (15 more...)

Neural Information Processing Systems

Country: Asia > China (0.68)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

941de7aa5976f372117725abd87c639a-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-19-2026, 22:02:00 GMT

Existing Embodied Question Answering (EQA) benchmarks primarily focus on household environments, often overlooking safety-critical aspects and reasoning processes pertinent to industrial settings. This drawback limits the evaluation of agent readiness for real-world industrial applications. To bridge this, we introduce IndustryEQA, the first benchmark dedicated to evaluating embodied agent capabilities within safety-critical warehouse scenarios. Built upon the NVIDIA Isaac Sim platform, IndustryEQA provides high-fidelity episodic memory videos featuring diverse industrial assets, dynamic human agents, and carefully designed hazardous situations inspired by real-world safety guidelines. The benchmark includes rich annotations covering six categories: equipment safety, human safety, object recognition, attribute recognition, temporal understanding, and spatial understanding. Besides, it also provides extra reasoning evaluation based on these categories. Specifically, it comprises 971 question-answer pairs generated from small warehouse and 373 pairs from large ones, incorporating scenarios with and without human. We further propose a comprehensive evaluation framework, including various baseline models, to assess their general perception and reasoning abilities in industrial environments. IndustryEQA aims to steer EQA research towards developing more robust, safety-aware, and practically applicable embodied agents for complex industrial environments.

large language model, machine learning, question answering, (21 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (0.66)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering

Neural Information Processing SystemsJun-19-2026, 13:01:02 GMT

While Multimodal Large Language Models (MLLMs) offer strong perception and reasoning capabilities for image-text input, Visual Question Answering (VQA) focusing on small image details still remains a challenge. Although visual cropping techniques seem promising, recent approaches have several limitations: the need for task-specific fine-tuning, low efficiency due to uninformed exhaustive search, or incompatibility with efficient attention implementations. We address these shortcomings by proposing a training-free visual cropping method, dubbed FOCUS, that leverages MLLM-internal representations to guide the search for the most relevant image region. This is accomplished in four steps: first, we identify the target object(s) in the VQA prompt; second, we compute an object relevance map using the key-value (KV) cache; third, we propose and rank relevant image regions based on the map; and finally, we perform the fine-grained VQA task using the topranked region.

llava-1, natural language, question answering, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.92)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.87)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.70)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)

Add feedback

6c7c9811d06b41b320b69abf37234f84-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-18-2026, 05:02:06 GMT

To quantify this stagnation, we introduce LIVEVQA, the first-of-its-kind dataset featuring 107,143 samples and 12 categories data specifically designed to support research in both seeking and updating with live visual knowledge. Drawing from recent news articles, video platforms, and academic publications in April 2024-May 2025, LIVEVQA enables evaluation of how models handle latest visual information beyond their knowledge boundaries and how current methods help to update them. Our comprehensive benchmarking of 17 state-of-the-art MLLMs reveals significant performance gaps on content beyond knowledge cutoff, and tool-use or agentic visual seeking framework drastically gain an average of 327% improvement. Furthermore, we explore parameter-efficient fine-tuning (PEFT) methods to update MLLMs with new visual knowledge. We dive deeply to the critical balance between adapter capacity and model capability when updating MLLMs with new visual knowledge. All the experimental dataset and source code are publicly available at: https://livevqa.github.io.

large language model, machine learning, question answering, (20 more...)

Neural Information Processing Systems

Country:

Asia (0.92)
Europe (0.67)
North America > United States > California > Los Angeles County (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Media > News (1.00)
Media > Film (1.00)
Law (1.00)
(7 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
(9 more...)

Add feedback

Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking

Neural Information Processing SystemsJun-17-2026, 18:58:11 GMT

Knowledge Graph Question Answering (KGQA) systems rely on high-quality benchmarks to evaluate complex multi-hop reasoning. However, despite their widespread use, popular datasets such as WebQSP and CWQ suffer from critical quality issues, including inaccurate or incomplete ground-truth annotations, poorly constructed questions that are ambiguous, trivial, or unanswerable, and outdated or inconsistent knowledge. Through a manual audit of 16 popular KGQA datasets--including WebQSPand CWQ--we find that the average factual correctness rate is only 57%. To address these issues, we introduce KGQAGen, an LLM-inthe-loop framework that systematically resolves these pitfalls. KGQAGencombines structured knowledge grounding, LLM-guided generation, and symbolic verification to produce challenging and verifiable QA instances. Using KGQAGen, we construct KGQAGen-10k, a 10K-scale benchmark grounded in Wikidata, and evaluate a diverse set of KG-RAG models. Experimental results demonstrate that even state-of-the-art systems struggle on this benchmark, highlighting its ability to expose limitations of existing models. Our findings advocate for more rigorous benchmark construction and position KGQAGen as a scalable framework for advancing KGQA evaluation 1.

large language model, machine learning, question answering, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Alabama (0.28)

Genre:

Research Report > New Finding (1.00)
Personal (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Government > Regional Government (0.92)
Media (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

MAGNET: AMulti-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

Neural Information Processing SystemsJun-16-2026, 23:06:57 GMT

Large multimodal models (LMMs) have shown remarkable progress in audiovisual understanding, yet they struggle with real-world scenarios that require complex reasoning across extensive video collections. Existing benchmarks for video question answering remain limited in scope, typically involving one clip per query, which falls short of representing the challenges of large-scale, audiovisual retrieval and reasoning encountered in practical applications. To bridge this gap, we introduce a novel task named AVHaystacksQA, where the goal is to identify salient segments across different videos in response to a query and link them together to generate the most informative answer. To this end, we present AVHaystacks, an audio-visual benchmark comprising 3100 annotated QA pairs designed to assess the capabilities of LMMs in multi-video retrieval and temporal grounding task. Additionally, we propose a model-agnostic, multi-agent framework MAGNET to address this challenge, achieving up to 89% and 65% relative improvements over baseline methods on BLEU@4 and GPT evaluation scores in QA task on our proposed AVHaystacks. To enable robust evaluation of multi-video retrieval and temporal grounding for optimal response generation, we introduce two new metrics, STEM, which captures alignment errors between a ground truth and a predicted step sequence and MTGS, to facilitate balanced and interpretable evaluation of segment-level grounding performance.

large language model, machine learning, question answering, (18 more...)

Neural Information Processing Systems

Country:

Asia (0.46)
North America > United States (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material (0.67)

Industry:

Education (0.92)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

PromptShift-CRC: Drift-Aware Conformal Risk Control for Foundation Models Under Prompt and Domain Shift

Opoku, Jeffery, Banahene, David

arXiv.org Machine LearningJun-16-2026

Foundation models are now used in settings where the prompts they receive can change quickly. Users change, topics change, policies change, and the model may suddenly face a kind of request that was rare in the calibration data. This makes fixed calibration risky. Conformal prediction and conformal risk control give model-agnostic ways to control error, but they work best when the calibration data still look like the future data. This paper develops PromptShift CRC, a drift-aware conformal risk control method for foundation-model outputs under prompt and domain shift. The method embeds prompts and responses, measures how far the current prompt stream has moved from the calibration pool, gives more weight to relevant or recent calibration examples, and updates the risk level online after observed violations. It reports three practical diagnostics: realized risk error, prompt drift, and effective calibration size. We give conditions under which the method controls risk up to terms for distribution mismatch and weighted quantile uncertainty. In a synthetic prompt-shift benchmark, static conformal risk control fails sharply after drift, while PromptShift-CRC gives the best coverage among the adaptive baselines considered. We then evaluate the same calibration layer on public benchmark derived streams for question answering, toxicity, summarization factuality, and long-context hallucination risk

large language model, machine learning, question answering, (17 more...)

arXiv.org Machine Learning

2606.15964

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.35)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.30)

Add feedback

Facebook's new AI tools offer more of the same, with photo-editing and question-answering capabilities

EngadgetJun-15-2026, 17:44:23 GMT

Facebook's new AI tools offer more of the same, with photo-editing and question-answering capabilities Facebook's new AI tools offer more of the same, with photo-editing and question-answering capabilities Now you can ask a different chatbot which restaurant to try. Meta just announced a suite of AI tools for Facebook users. Nothing here looks especially new, but availability on Facebook could be of some use to certain power users. This is a standard chatbot that answers questions, with Meta using the example everyone uses when rolling out one of these tools. The company highlights a person asking the chatbot for nearby summer vacation spots. Meta does say that AI Mode pulls data from across its apps, like from Groups and Reels, so maybe the information provided will be slightly different than when asking about summer getaways via Gemini, Claude, Grok, ChatGPT and all the rest.

artificial intelligence, natural language, question answering, (9 more...)

Engadget

Industry:

Information Technology > Services (0.80)
Leisure & Entertainment > Games > Computer Games (0.76)
Consumer Products & Services > Travel (0.58)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering

Neural Information Processing SystemsJun-15-2026, 07:57:37 GMT

Current document retrieval-augmented generation (DocRAG) Therefore, the number of female respondents who never listened to theradio is: Number of females who never listened = 2,001 * 0.557 = 1,115 methods remain limited by their text-centric approaches, frequently missing "text12": [ "The table provides a

large language model, machine learning, question answering, (22 more...)

Neural Information Processing Systems

Country: